Skip to content

fix: add Athena-specific edr_drop_schema and always-run schema cleanup step#960

Merged
haritamar merged 4 commits intomasterfrom
devin/1772558962-fix-athena-cleanup-iceberg
Mar 3, 2026
Merged

fix: add Athena-specific edr_drop_schema and always-run schema cleanup step#960
haritamar merged 4 commits intomasterfrom
devin/1772558962-fix-athena-cleanup-iceberg

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Mar 3, 2026

fix: add Athena-specific edr_drop_schema and always-run schema cleanup step

Summary

The nightly cleanup job fails on Athena because DROP SCHEMA … CASCADE (issued by the default edr_drop_schema) errors when the schema contains Iceberg tables:

Exception occurs when attempting to drop Iceberg table dbt_models__tmp_20260228175018329843

This adds an athena__edr_drop_schema dispatch that works around the issue by:

  1. Listing all relations in the schema via adapter.list_relations_without_caching()
  2. Dropping each relation individually via adapter.drop_relation() (the dbt-athena adapter already handles Iceberg vs Hive table differences internally — it uses the Glue API path by default)
  3. Dropping the now-empty schema

Additionally, stale schemas were accumulating across all adapters because schemas are only cleaned up during pytest's session teardown (--clear-on-end), which doesn't run when jobs are cancelled (cancel-in-progress: true) or crash. This PR also adds:

  • A drop_test_schemas macro that drops all schemas a CI run may have created (base schema + xdist worker schemas _gw0 through _gw7, each with its _elementary counterpart)
  • An if: always() step in test-warehouse.yml that calls this macro — only for cloud targets (snowflake, bigquery, redshift, databricks_catalog, athena), since local adapters (postgres, clickhouse, trino, dremio, spark, duckdb) don't persist schemas between runs. This matches the pattern already used in the elementary repo's test workflow.

Updates since last revision

  • Fixed the dbt binary path in the new "Drop test schemas" workflow step to use ~/.local/bin/dbt when inputs.dbt-version == 'fusion', matching the conditional used by other steps (e.g., "Install dependencies", "Check DWH connection").
  • Scoped the "Drop test schemas" step to cloud targets only (snowflake, bigquery, redshift, databricks_catalog, athena), per reviewer feedback that local adapters don't persist schemas.

Review & Testing Checklist for Human

  • Verify schema name construction in drop_test_schemas: The macro constructs elementary schema names as <base_schema>_elementary<suffix>. Confirm this matches what generate_schema_name("elementary", node) + the xdist suffix actually produces. If the naming convention differs, schemas will be silently skipped (no error, just no-op).
  • Verify adapter.drop_relation() works in run-operation context: In dbt-athena, athena__drop_relation reads config.get('native_drop', default=false). In a run-operation context there's no model config, so it should default to false and use the Glue API path (drop_relation_glue). Confirm this is the correct/working path for Iceberg tables.
  • Hardcoded cloud target list: The workflow step uses contains(fromJSON('["snowflake","bigquery","redshift","databricks_catalog","athena"]'), ...). If new cloud targets are added to test-all-warehouses.yml in the future, this list will need manual updating.
  • No error handling on individual relation drops: If adapter.drop_relation() fails on one table (e.g., Iceberg DROP timeout), the macro will abort the current schema's loop. The workflow step has continue-on-error: true as a safety net, but remaining schemas in the drop_test_schemas loop won't be attempted either.
  • End-to-end validation: Re-run the cleanup workflow manually targeting Athena, and trigger a test-warehouse run to verify the new Drop test schemas step works.

Notes

  • The if: always() cleanup step is best-effort — if the job is forcibly killed by cancel-in-progress before reaching this step, schemas will still leak. The nightly cleanup job remains the ultimate safety net.
  • Requested by: @haritamar
  • Devin Session

Summary by CodeRabbit

  • New Features

    • Added public environment-cleanup macros to drop test schemas and to safely drop Athena schemas during cleanup.
    • Added a CI workflow step to run the cleanup step during test runs.
  • Bug Fixes

    • Improved reliability of schema drops for Athena environments (handles Iceberg-related drop failures).
  • Documentation

    • Added brief usage/context notes for the new cleanup macros.

Athena's SQL DROP SCHEMA ... CASCADE fails when the schema contains
Iceberg tables. Add an athena__edr_drop_schema dispatch that drops
every relation individually first (the adapter handles Iceberg vs Hive
differences in its drop_relation implementation) before removing the
now-empty schema.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 3, 2026

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 010f76e and ec9e3a0.

📒 Files selected for processing (1)
  • .github/workflows/test-warehouse.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/test-warehouse.yml

📝 Walkthrough

Walkthrough

Adds two dbt macros — drop_test_schemas(num_workers=8) and athena__edr_drop_schema(database_name, schema_name) — to drop test and elementary schemas (dropping relations first for Athena). Also adds a CI workflow step to run elementary_tests.drop_test_schemas during tests.

Changes

Cohort / File(s) Summary
Schema cleanup macros
integration_tests/dbt_project/macros/clear_env.sql
Added drop_test_schemas(num_workers=8) to iterate base and worker-specific test schemas and call schema drops; added athena__edr_drop_schema(database_name, schema_name) which lists relations and drops each before calling dbt.drop_schema to avoid Iceberg/Athena drop failures.
CI workflow step
.github/workflows/test-warehouse.yml
Added "Drop test schemas" step that runs the elementary_tests.drop_test_schemas dbt operation from the tests dir, scoped by warehouse type; configured to always run and continue-on-error.

Sequence Diagram(s)

sequenceDiagram
  participant CI as CI workflow (GitHub)
  participant DBT as dbt operation
  participant Adapter as DB adapter
  participant Athena as Athena / Metastore

  CI->>DBT: run operation elementary_tests.drop_test_schemas
  DBT->>Adapter: call athena__edr_drop_schema(database, schema)
  Adapter->>Athena: list_relations_without_caching(database, schema)
  Athena-->>Adapter: return relations
  loop for each relation
    Adapter->>Athena: drop_relation(relation)
    Athena-->>Adapter: relation dropped
  end
  DBT->>DBT: dbt.drop_schema(database, schema)
  DBT-->>CI: operation complete
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐇 I hop through catalogs, nibble each name,
I list and drop softly, no two the same.
One by one I tidy, then close the door—
Test schemas vanish, the meadow’s clean once more. 🌿🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the two main changes: adding an Athena-specific schema drop macro to fix Iceberg-related failures and ensuring schema cleanup always runs in the CI workflow.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1772558962-fix-athena-cleanup-iceberg

Comment @coderabbitai help to get the list of available commands and usage tips.

Add a drop_test_schemas macro that drops all schemas created by a CI
test run (base + xdist worker schemas _gw0 through _gw7).

Add an 'if: always()' step to test-warehouse.yml that calls this macro,
ensuring schemas are cleaned up even when pytest is cancelled or crashes
before its own teardown runs.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@devin-ai-integration devin-ai-integration Bot changed the title fix: add Athena-specific edr_drop_schema to handle Iceberg tables fix: add Athena-specific edr_drop_schema and always-run schema cleanup step Mar 3, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/test-warehouse.yml:
- Around line 206-213: The "Drop test schemas" job step uses the plain dbt
binary but should follow the earlier conditional that uses ~/.local/bin/dbt when
inputs.dbt-version == 'fusion'; update the run command that calls "dbt
run-operation elementary_tests.drop_test_schemas" to conditionally invoke
"~/.local/bin/dbt" when inputs.dbt-version == 'fusion' (matching the earlier
steps) so the same dbt binary is used for fusion builds and avoid invoking the
wrong dbt installation.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2e61b62 and 3a78eb7.

📒 Files selected for processing (2)
  • .github/workflows/test-warehouse.yml
  • integration_tests/dbt_project/macros/clear_env.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • integration_tests/dbt_project/macros/clear_env.sql

Comment thread .github/workflows/test-warehouse.yml
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
integration_tests/dbt_project/macros/clear_env.sql (1)

75-77: Individual relation drop failures will halt the cleanup loop.

If adapter.drop_relation(relation) fails for any relation (e.g., permissions, transient error), the remaining relations won't be dropped. While the workflow uses continue-on-error, partial cleanup within this macro could leave orphaned relations.

Consider wrapping individual drops to continue on failure:

♻️ Suggested approach (if dbt/Jinja supports it)
     {% for relation in relations %}
-        {% do adapter.drop_relation(relation) %}
+        {% do log("Dropping relation: " ~ relation, info=true) %}
+        {# Note: dbt macros don't have native try/catch; consider adapter.execute with error handling if available #}
+        {% do adapter.drop_relation(relation) %}
     {% endfor %}

If robust error handling is needed, this may require a Python-based approach or accepting the current behavior with workflow-level continue-on-error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/dbt_project/macros/clear_env.sql` around lines 75 - 77, The
loop over relations currently calls adapter.drop_relation(relation) directly so
a single failure aborts the macro; modify the macro to handle per-relation
failures by wrapping each adapter.drop_relation(relation) call in a
try/except-style guard (or Jinja-equivalent error handler) that catches the
error, logs or records the failed relation, and continues the loop so remaining
relations are attempted; if Jinja/dbt templating cannot catch exceptions
reliably, move this logic into a Python macro or helper that iterates relations
and calls adapter.drop_relation inside a try/except to ensure individual drop
failures do not stop the cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@integration_tests/dbt_project/macros/clear_env.sql`:
- Around line 75-77: The loop over relations currently calls
adapter.drop_relation(relation) directly so a single failure aborts the macro;
modify the macro to handle per-relation failures by wrapping each
adapter.drop_relation(relation) call in a try/except-style guard (or
Jinja-equivalent error handler) that catches the error, logs or records the
failed relation, and continues the loop so remaining relations are attempted; if
Jinja/dbt templating cannot catch exceptions reliably, move this logic into a
Python macro or helper that iterates relations and calls adapter.drop_relation
inside a try/except to ensure individual drop failures do not stop the cleanup.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2e61b62 and 010f76e.

📒 Files selected for processing (2)
  • .github/workflows/test-warehouse.yml
  • integration_tests/dbt_project/macros/clear_env.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/test-warehouse.yml

Comment thread .github/workflows/test-warehouse.yml
…'t persist)

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@haritamar haritamar merged commit d88dc70 into master Mar 3, 2026
24 checks passed
@haritamar haritamar deleted the devin/1772558962-fix-athena-cleanup-iceberg branch March 3, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant